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The peopling of the Americas has been the subject of extensive 
genetic, archaeological and linguistic research; however, central 
questions remain unresolved’ °. One contentious issue is whether 
the settlement occurred by means of a single** migration or 
multiple streams of migration from Siberia?'’. The pattern of 
dispersals within the Americas is also poorly understood. To 
address these questions at a higher resolution than was previously 
possible, we assembled data from 52 Native American and 
17 Siberian groups genotyped at 364,470 single nucleotide 
polymorphisms. Here we show that Native Americans descend 
from at least three streams of Asian gene flow. Most descend 
entirely from a single ancestral population that we call ‘First 
American’. However, speakers of Eskimo-Aleut languages from 
the Arctic inherit almost half their ancestry from a second stream 
of Asian gene flow, and the Na-Dene-speaking Chipewyan from 
Canada inherit roughly one-tenth of their ancestry from a third 
stream. We show that the initial peopling followed a southward 
expansion facilitated by the coast, with sequential population splits 
and little gene flow after divergence, especially in South America. A 
major exception is in Chibchan speakers on both sides of the 
Panama isthmus, who have ancestry from both North and South 
America. 

The settlement of the Americas occurred at least 15,000 years ago 
through Beringia, a land bridge between Asia and America that existed 
during the ice ages’*. Most analyses of Native American genetic 


diversity have examined single loci, particularly mitochondrial DNA 
or the Y chromosome, and some interpretations of these data model 
the settlement of America as a single migratory wave from Asia°*. We 
assembled native population samples from Canada to the southern tip 
of South America, genotyped them on single nucleotide polymorphism 
(SNP) microarrays, and merged our data with six other data sets. The 
combined data set consists of 364,470 SNPs genotyped in 52 Native 
American populations (493 samples; Fig. la and Supplementary 
Table 1), 17 Siberian populations (245 samples; Supplementary Fig. 1 
and Supplementary Table 2) and 57 other populations (1,613 samples) 
(Supplementary Notes). 

A complication in studying Native American genetic history is 
admixture with European and African immigrants since 1492. Cluster 
analysis'® shows that many of the samples we examined have some 
non-native admixture (an average of 8.5%; Fig. 1b and Supplementary 
Tables 1 and 3). This admixture is a challenge for learning about the 
historical relationships among the populations, and to address this 
complication we used three independent approaches. First, we 
restricted analyses to 163 Native Americans from 34 populations 
without evidence of admixture (Supplementary Notes). Second, we 
subtracted the expected contribution of European and African 
ancestry to the statistics we used to learn about population relation- 
ships (Supplementary Notes). Third, we inferred the probability of 
non-native ancestry at each genomic segment and ‘masked’ segments 
with more than a negligible probability of this ancestry (Fig. 1b, 
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Figure 1 | Geographic, linguistic and genetic overview of 52 Native 
American populations. a, Sampling locations of the populations, with colours 
corresponding to linguistic groups. b, Cluster-based analysis (k = 4) using 
ADMIXTURE shows evidence of some West-Eurasian-related and sub- 
Saharan-African-related ancestry in many Native Americans before masking 
(top), but little afterwards (bottom). Thick vertical lines denote major linguistic 
groupings, and thin vertical lines separate individual populations. 


Supplementary Notes and Supplementary Fig. 2). Our inferences from 
these three approaches are concordant (Supplementary Figs 3 and 4). 

We built a tree (Fig. 1c) using Fy, distances between pairs of popula- 
tions, which broadly agrees with geography and linguistic categories’” 
(trees based on masked and unmasked data were similar; Supplemen- 
tary Fig. 3). An early split separates Asians from Native Americans and 
extreme northeastern Siberians (Chukchi, Naukan, Koryak), which is 
consistent with studies that have identified pan-American variants 
shared with some northeastern Siberians®””®"*. Eskimo—Aleut speakers 
and far-northeastern Siberians form a cluster that is separated from 
other Native American populations by a long internal branch. Within 
America the tree shows a series of splits in an approximate north-south 
sequence beginning with the Arctic, followed by northern North 
America, northern/central and southern Mexico and lower Central 
America/Colombia, and ending in three South American clusters 
(the Andes, the Chaco region and eastern South America). This pattern 
of splits is consistent with a north-south population expansion, an 
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c, Neighbour-joining tree based on F,, distances relating Native American to 
selected non-American populations (sample sizes in parentheses). Native 
American and Siberian data were analysed after masking, but consistent trees 
were obtained on a subset of completely unadmixed samples (Supplementary 
Fig. 3). Some populations have evidence for substructure, and we represent 
these as two different groups (for example Mayal and Maya2). 


inference that is also supported by the negative correlation between 
heterozygosity and distance from the Bering Strait (r= —0.48, 
P=0.007). This correlation increases if we use ‘least cost distances’ 
that consider the coasts as facilitators of migration’”’, and persists if 
we exclude four Native North American populations with ancestry 
from later streams of Asian gene flow (Supplementary Notes and 
Supplementary Fig. 5). 

Trees provide a simplified model of history that does not accom- 
modate the possibility of gene flow after population separation. 
Circumstantial evidence that some Native American populations 
may not fit a simple tree comes from cluster analysis, which infers 
Siberian-related ancestry in some northern North Americans (Fig. 1b), 
and from single-locus studies that have identified genetic variants 
shared between Eurasia and North America that are absent from 
South America''”””*. The advent of genome-wide data sets has allowed 
the development of a formal four-population test for whether sets of 
four populations are consistent with a tree. This test is robust to the 


©2012 Macmillan Publishers Limited. All rights reserved 





LETTER 


Table 1 | Native Americans descend from at least three streams of Asian gene flow 


Population groupings tested 





Minimum number of streams of Asian 
gene flow needed to explain the data 


P value for this many Asian streams being enough to explain the data 








1 2 3 
East Greenland Inuit/West Greenland Inuit/First American €10-? 0.64 1 2 
East Greenland Inuit/Aleutian/First American <10°° 0.57 il 2 
West Greenland Inuit/Aleutian/First American <10°° 0.41 1 2 
Chipewyan/East Greenland Inuit/First American <10° 0.02 ll 3 
Chipewyan/West Greenland Inuit/First American <10° 0.006 il 3 
Chipewyan/Aleutian/First American <10° 0.03 1 3 
Saqqaq/East Greenland Inuit/First American <10° 6x10° 1 3 
Saqqaq/West Greenland Inuit/First American <10° 2x10 1 3 
Saqqaq/Aleutian/First American <10° 0.17 ll 2 
Saqqaq/Chipewyan/First American <10° 0.29 il 2 
Saqqaq/Eskimo-Aleut/Chipewyan/First American <10° 8x10 8 0.27 3 








We use the method described in Supplementary Notes to test formally whether specified groupings of Native American populations are consistent with descending from one, two or three streams of gene flow from 
Asia. We use ‘First American’ to refer to a pool of 43 populations from Meso-America southward, and ‘Eskimo-Aleut’ to refer to a pool of East and West Greenland Inuit and Aleuts. We test either three or four 

population groupings (when there are three groupings, the maximum number of streams we can reject is two, and so the P value for three streams is always 1). Atleast two streams of Asian gene flow are required to 
explain all rows (P< 107°). The Chipewyan, Eskimo-Aleut and First Americans can only be jointly explained by at least three streams. Analysis of the Saqqaq Palaeo-Eskimo (using about sixfold fewer SNPs than for 


the other analyses) show that the Asian ancestry in this individual has a component that is different from that in First Americans and Greenland Inuit, but indistinguishable from the Chipewyan. 


ascertainment bias affecting SNP arrays”. For each of the 52 Native 
American populations in turn, we tested the hypothesis that they 
conform to the tree: ((test population, southern Native American), 
(outgroup1, outgroup2)) for 45 pairs of ten Asian outgroups. We used 
a Hotelling T-test to evaluate whether all four-population test f, 
statistics of this form are consistent with the expectation of zero 
(Supplementary Notes). The test is not significant for 47 populations, 
which is consistent with their stemming from the same, presumably 
first, wave of American settlement; we call this ancestry ‘First 
American’ (Table 1). In contrast, four populations from northern 
North America show highly significant evidence of ancestry from 
additional streams of gene flow from Asia, subsequent to the initial 
peopling of America, which we confirm through the Hotelling T-test 
and a complementary test (Supplementary Notes): East Greenland 
Inuit (P<10 °), West Greenland Inuit (P<10 ”), Aleutian 
Islanders (P=9 X10 °) and Chipewyan (P< 10 °). The recently 
sequenced genome of a 4,000-year-old Saqqaq Palaeo-Eskimo from 
Greenland” also has evidence of ancestry that is distinct from more 
southern Native Americans (P = 2 X 10 ”) (Supplementary Notes). 
Examination of the values of the f, statistics allows us to infer the 
minimum number of gene flow events from Asia into America con- 
sistent with the data. Each stream of gene flow is expected to produce a 
distinct vector of f, statistics, constituting a ‘signature’ of how the 
ancestral migrating population relates to present-day Asian popula- 
tions. By finding the minimum number of vectors whose linear com- 
binations are necessary to produce the vector observed in each 
population, we infer that a minimum of three gene flow events from 
Asia are necessary to explain the data from all Native American popu- 
lations jointly, including the Saqqaq Palaeo-Eskimo (Supplementary 
Notes). These three episodes correspond to First American ancestry 
(distributed throughout the Americas) and to two additional streams 
of gene flow detected in a subset of northern North Americans 
(East Greenland Inuit, West Greenland Inuit, Aleutian Islanders, 
Chipewyan and Saqqaq). Table 1 shows that f, statistics in the Inuit 
and Aleutian islanders are consistent with deriving the non-First- 
American portions of their ancestry from the same later stream of 
Asian gene flow, providing support for deep shared ancestry between 
these linguistically linked groups'*’®. The Na-Dene-speaking 
Chipewyan have a different pattern of f, statistics from Eskimo- 
Aleut speakers, implying that they descend at least in part from a 
separate stream of Asian gene flow (P< 10 ° for comparisons with 
the Greenland Inuit; Table 1). This is consistent with the hypothesis 
that Na-Dene languages mark a distinct migration from Asia®’’. 
Because we only have data from one Na-Dene-speaking group, an 
important direction for future work will be to test whether the distinct 
Asian ancestry that we detect in the Chipewyan is a shared signature 
throughout Na-Dene speakers. Finally, the Saqqaq”* have a vector of fy 
statistics consistent with that in the Chipewyan, raising the possibility 


that the Saqqaq and Chipewyan both carry genetic material from the 
same later stream of Asian gene flow into the Americas, postdating the 
First American migration (Supplementary Notes). 

To develop an explicit model for the settlement of the Americas, we 
used the admixture graph (AG) framework”. AGs are generalizations 
of trees that accommodate the possibility of a limited number of 
unidirectional gene flow events. They are powerful tools for learning 
about history because they make predictions about the values of 
frstatistics (such as f,) that can be used to test the fit of a proposed 
model™ (Supplementary Notes). Figure 2 presents an AG relating 
selected Native American and Old World populations that is a good 
fit to the data in the sense that none of the f-statistics predicted by the 





Figure 2 | Distinct streams of gene flow from Asia into America. We present 
an AG that gives no evidence of being a poor fit to the data and is consistent 
with three streams of Asian gene flow into America. Solid points indicate 
inferred ancestral populations, drift on each lineage is given in units 
proportional to 1,000 X F,,, and mixture events (dotted lines) are denoted by 
the percentage of ancestry. The Asian lineage leading to First Americans is the 
most deeply diverged, whereas the Asian lineages leading to Eskimo-—Aleut 
speakers and the Na-Dene-speaking Chipewyan are more closely related and 
descend from a common Siberian ancestral population that is a sister group to 
the Han. The inferred ancestral populations are indicated by filled circles, and 
the lineages descending from them are coloured: First American (blue), 
ancestors of the Na-Dene-speaking Chipewyan (green), and Eskimo-Aleut 
(red). The model also infers a migration of people related to Eskimo-Aleut 
speakers across the Bering Strait, thus bringing First American genes to Asia 
(the Naukan are shown, but the Chukchi show a similar pattern; 
Supplementary Notes). 
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model are more than three standard errors from what is observed. This 
supports the hypothesis of three deep lineages in Native Americans: 
the Asian lineage leading to First Americans is the most deeply 
diverged, whereas the Asian lineages leading to Eskimo—Aleut speakers 
and the Na-Dene-speaking Chipewyan are more closely related and 
descend from a putative Siberian ancestral population more closely 
related to Han (Fig. 2). We also arrive at the finding that Eskimo- 
Aleut populations and the Chipewyan derive large proportions of their 
genomes from First American ancestors: an estimated 57% for 
Eskimo-Aleut speakers, and 90% in the Chipewyan, probably reflect- 
ing major admixture events of the two later streams of Asian migration 
with the First Americans that they encountered after they arrived 
(Supplementary Notes). The high proportion of First American 
ancestry explains why Eskimo-Aleut and Chipewyan populations 
cluster with First Americans in trees like that in Fig. 1c despite having 
some of their ancestry from later streams of Asian migration, and 
explains the observation of some genetic variants that are shared by 
all Native Americans but are absent elsewhere®”'*"*. We also infer 
back-migration of populations related to the Eskimo-Aleut from 
America into far-northeastern Siberia (we obtain an excellent fit to 
the data when we model the Naukan and coastal Chukchi as mixtures 
of groups related to the Greenland Inuit and Asians (Fig. 2 and 
Supplementary Notes)). This explains previous findings of pan- 
American alleles also in far-northeastern Siberia®”'®™. 

We next used AGs to develop a model for the history of populations 
who derive all their ancestry from the First American migration, with 
no ancestry from subsequent streams of Asian gene flow. Figure 3 
presents an AG we built for 16 selected Native American populations 
and two outgroups, which is a good fit to the data in that the largest 
|Z|-score for a difference between the observed and predicted 
f-statistics is 3.2 from among the 11,781 statistics we tested (Sup- 
plementary Notes) (The AG of Fig. 3 used masked data; however, a 
consistent set of relationships is inferred for unadmixed samples 
(Supplementary Fig. 4).) This model provides a greatly improved 
statistical fit to the data compared with the tree of Fig. 1c and leads 
to several novel inferences. First, a relatively large fraction of South 
American populations fit the AG without a need for admixture events, 
which we speculate reflects a history of limited gene flow among these 
populations since their initial divergence. In contrast, only a small 
fraction of Meso-American populations fit into the AG, which could 
reflect either a higher rate of migration among neighbouring groups or 
our denser sampling in Meso-America allowing us to detect more 
subtle gene flow events. Second, some Meso-American populations 
have experienced very little genetic drift since divergence from the 
common ancestral population with South Americans (adding up the 
genetic drifts along the relevant edges of Fig. 3, we infer F,, = 0.014 
between the Zapotec and a hypothetical population ancestral to all of 
Central and South America), suggesting that effective population sizes 
in Meso-America have been relatively large since settlement of the 
region. Third, the model infers three admixture events consistent with 
geographic locations and linguistic affiliations (Supplementary Notes). 
The Inga have both Amazonian and Andean ancestry, which is con- 
sistent with their speaking a Quechuan language but living in the 
eastern Andean slopes of Colombia and thus interacting with groups 
in the neighbouring Amazonian lowlands. The Guarani stem from two 
distinct strands of ancestry within eastern South America. The most 
striking admixture event is in the Costa Rican Cabecar (Fig. 3) and 
other Chibchan-speaking populations (Supplementary Notes) from 
the Isthmo-Colombian area. One of the lineages that we detect in these 
populations occurs definitively within the radiation of South American 
populations, and so the presence of these populations in lower Central 
America suggests that there was reverse gene flow across the Panama 
isthmus after the initial settlement of South America. There has been 
controversy about whether Chibchan speakers of lower Central 
America represent direct descendants of the first settlers in the region 
or more recent migration across the isthmus, and our results support 
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Figure 3 | A model fitting populations of entirely First American ancestry. 
We show an AG depicting the relationships between 16 selected Native 
American populations with entirely First American ancestry along with two 
outgroups (Yoruba and Han). The Colombian Inga are modelled as a mixture 
of Andean and Amazonian ancestry. The Paraguayan Guarani are fitted as a 
mixture of separate strands of ancestry from eastern South America. The 
Central American Cabecar are modelled as a mixture of strands of ancestry 
related to South Americans and to North Americans, supporting back- 
migration from South into Central America. The colouring of edges indicates 
alternative insertion points for the admixing lineages leading to the Cabecar 
that produce a similar fit to the data in the sense that the 7’ statistic is within 
3.84 of the AG shown. The red colouring shows that the South American 
lineage contributing to the Cabecar split off after the divergence of the Andean 
populations, and the blue colouring shows that the other lineage present in the 
Cabecar diverged before the separation of Andeans. Estimated admixture 
proportions are shown along the dotted lines, and lineage-specific drift 
estimates are in units proportional to 1,000 X Fy. 


the view that more recent migration has contributed most of these 
populations’ ancestry”. 

This is the most comprehensive survey of genetic diversity in Native 
Americans so far. Our analyses show that the great majority of Native 
American populations—from Canada to the southern tip of Chile— 
derive their ancestry from a homogeneous ‘First American’ ancestral 
population, presumably the one that crossed the Bering Strait more 
than 15,000 years ago**. We also document at least two additional 
streams of Asian gene flow into America, allowing us to reject the view 
that all present-day Native Americans stem from a single migration 
wave®*, and supporting the more complex scenarios proposed by 
some other studies’’*. In particular, the three distinct Asian lineages 
we detect—‘First American’, ‘Eskimo-Aleut’ and a separate one in the 
Na-Dene-speaking Chipewyan—are consistent with a three-wave 
model proposed’ mostly on the basis of dental morphology and a 
controversial interpretation of the linguistic data. However, our 
analyses also document extensive admixture between First Americans 
and the subsequent streams of Asian migrants, which was not predicted 
by that model, such that Eskimo-Aleut speakers and the Chipewyan 
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derive more than half their ancestry from First Americans. Further 
insights into Native American history will benefit from the application 
of analyses similar to those performed here to whole-genome 
sequences and to data from the many admixed populations in the 
Americas that do not self-identify as native”. 


METHODS SUMMARY 


The DNA samples we analysed were collected over several decades. For each 
sample we verified that informed consent was obtained consistent with studies 
of population history and that institutional approval had been obtained in the 
country of collection. Ethical oversight and approval for this project was provided 
by the National Health Service National Research Ethics Service, Central London 
committee (reference no. 05/Q0505/31). The data set is based on merging Illumina 
SNP array data newly generated for this study (including 273 Native American 
samples) with data from six other studies. We applied stringent data curation and 
validation procedures to the merged data set. We used local ancestry inference 
software to identify genome segments in each Native American and Siberian 
sample without evidence of recent European or African admixture, and created 
a data set that masked segments of potentially non-native origin. Most analyses are 
performed on the masked data set; however, we confirmed major inferences on a 
subset of 163 Native American samples that had no evidence of European or 
African admixture. We used model-based clustering and neighbour-joining 
trees to obtain an overview of population relationships, and then tested whether 
proposed sets of four populations were consistent with having a simple tree 
relationship using the four-population test, which we generalized by means of a 
Hotelling T-test. We analysed the correlation in allele frequency differences across 
populations to infer the minimum number of gene flow events that occurred 
between Asia and America. We fitted the patterns of correlation in allele frequency 
differences to proposed models of history —A Gs—that can incorporate population 
splits and mixtures. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


DNA samples. The samples analysed here were collected for previous studies over 
several decades. We reviewed the documentation available for each population to 
confirm that all samples were collected with informed consent encompassing 
genetic studies of population history. Institutional approval for use of each set 
of samples in such research was obtained before this study in the country of 
collection. Approval for this study was also provided by the National Research 
Ethics Service, Central London REC 4 (reference no. 05/Q0505/31). 
Genotyping. All samples were genotyped by using Illumina arrays, and the data 
set analysed here is the result of merging data from seven different sources 
(Supplementary Notes). The genotyping conducted specifically for this study 
was performed at the Broad Institute of Harvard and Massachusetts Institute of 
Technology, with the exception of ten Chipewyan samples that were genotyped at 
McGill University (no systematic differences were observed between these and the 
five Chipewyan samples genotyped at the Broad Institute). Supplementary Table 3 
specifies details for each of the 493 Native American samples. A total of 419 
samples were genotyped from genomic DNA, and 74 from whole-genome- 
amplified material prepared using the Qiagen REPLI-g midi kit. 

Data curation. We required more than 95% genotyping completeness for each SNP 
and sample. We merged the data specifically obtained for this study with six other 
data sets. We further removed samples that were outliers in principal-component 
analysis relative to others from their group, showed an excess rate of heterozygotes in 
comparison with the expected rate from the allele frequencies in the population, or 
had evidence of being a second-degree relative or closer to another sample in the 
study (Supplementary Notes). Genetic analyses summarized in the Supplementary 
Notes found substructure in some populations (Maya, Zapotec and Nganasan); we 
use labels such as ‘Mayal’ and ‘Maya2’ to indicate the subgroups. 

Masking of genomic segments containing non-Native American ancestry. For 
each Native American individual, we used HAPMIX”' to model their haplotypes 
with two ancestral panels: first, ‘Old World’ populations (a pool of 408 Europeans 
and 130 West Africans) and second, ‘Native’ populations, a pool of all Native 
American and Siberian populations. Haplotype phase in the ancestral panel, which 
is necessary for HAPMIX, was determined by phasing both pools of samples 
together with Beagle’. We masked genome segments that had an expected number 
of more than 0.01 non-Native American chromosomes according to HAPMIX, 
thus retaining only segments with an extremely high nominal probability of being 
homozygous for native ancestry. Multiple analyses reported in Supplementary 
Information indicate that our masking procedure produces inferences about 
history that are consistent with those based on unadmixed samples. 

Population structure analysis, F,, and neighbour-joining tree. We used 
EIGENSOFT to perform PCA and compute pairwise population F,, (ref. 33). 
Clustering was performed with ADMIXTURE". A neighbour-joining™ tree based 
on Fy, was built with POWERMARKER”. 

Linguistic categories. We used Greenberg’s classification'”**. We considered 
using alternative classifications; however, others (for example that in ref. 37) do 
not propose links between languages at a deep enough level to compare with 
genetic relationships on a continent-wide scale. 

Correlating geography with population diversity. Euclidean distances from the 
Bering Strait (64.8°N, 177.8°E) and the location of each population 
(Supplementary Table 1) were calculated by using great arc distances based on a 
Lambert azimuthal equal-area projection. Least-cost distances between the same 
points were computed with PATHMATRIX”, which allowed us to build a spatial 
cost map incorporating the coastal outline of the Americas. We compared the 
following coastal/inland relative costs: 1:2, 1:5, 1:10, 1:20, 1:30, 1:40, 1:50, 1:100, 
1:200, 1:300, 1:400 and 1:500. We computed a Pearson correlation coefficient 
between heterozygosity for each population and their least-cost distance from 
the Bering Strait (Supplementary Notes). 

Documentation of at least three streams of gene flow from Asia to America. 
Weused the four-population test to assess whether proposed sets of four populations 





were consistent with a tree. For each of 52 test populations, we assessed their 
consistency with deriving from the same Asian source population as southern 
Native Americans by studying statistics of the form f, (southern Native 
American, test population; outgroup1, outgroup2), where the two outgroups are 
the 45 (=10 X 9/2) possible pairs of ten Asian outgroups (Han Chinese and nine 
Siberian populations with at least ten samples each, and not including the Naukan 
and Chukchi whom we showed to have some First American ancestry as a result of 
back-migration across the Bering Strait, making them inappropriate as outgroups 
(Supplementary Notes)). We applied a Hotelling T-test to assess whether the 
ensemble of all possible f, statistics was consistent with zero after taking into 
account their correlation structure, resulting in a single hypothesis test for whether 
the test population was consistent with having the same relationship to the panel of 
Asian populations as the set of southern Native American samples used as a 
reference group. We also generalized this test by studying the matrix of all f, 
statistics simultaneously and computing statistics that measured whether the f4 
statistics seen in proposed sets of Native American populations were consistent 
with deriving from a specified number of Asian migrations. In Supplementary 
Notes we show that if there have been N distinct streams of gene flow from Asia 
into the Americas, then the matrix of all possible f statistics can have rank no more 
than N— 1 (ignoring sampling noise). The case N = 1 reduces to calculating a 
Hotelling T° statistic. We also developed a likelihood ratio test, generalizing the 
Hotelling T-test, to evaluate the statistical evidence for larger values of N, allowing 
us to estimate the minimum number of exchanges between Asia and America that 
are needed to explain the genetic data. 

Admixture graphs. We used the AG framework” to fit models of population 
separation followed by mixture to the data. An AG makes predictions about the 
correlations in allele frequency differentiation statistics (f-statistics) that will be 
observed between all pairs, triples and quadruples of populations”, and these can 
be compared with the observed values (along with a standard error from a Block 
Jackknife) to test hypotheses about population relationships (Supplementary 
Notes). We do not have a formal goodness-of-fit test for whether a given AG fits 
the data correcting for the number of hypotheses tested and number of degrees of 
freedom, but use two approximations. First, we examine individual f-statistics, 
searching for those that are more than three standard errors from expectation 
indicative of a poor fit. Second, we compute a 7’ statistic for the match between the 
observed and predicted f-statistics, taking into account the empirical covariance 
matrix among the f-statistics computed on the basis of a Block Jackknife. This 
results in a nominal P value, but it is unclear to us at present whether the empirical 
covariance matrix that we obtain can be equated with the theoretical covariance 
matrix that is needed to compute a formal P value. For a fixed graph complexity 
(number of drift edges and admixture weights), however, we can compare the 
7? value for different admixture graphs to obtain a formal test for whether some 
topologies are significantly better fits; this results in the colouring of edges in 
Fig. 3, which shows alternative insertion points for admixture edges that are 
equally good fits. 
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